Skip to content

Harden RocksDB snapshot error handling.#9770

Open
vyalamar wants to merge 2 commits intoapache:masterfrom
vyalamar:rocksdb-hardening-L5
Open

Harden RocksDB snapshot error handling.#9770
vyalamar wants to merge 2 commits intoapache:masterfrom
vyalamar:rocksdb-hardening-L5

Conversation

@vyalamar
Copy link
Contributor

@vyalamar vyalamar commented Feb 15, 2026

Summary

  • Replace RuntimeException TODOs in snapshot RocksDB collections with SnapshotStorageException.
  • Wrap compaction DAG/log paths in RocksDBCheckpointDiffer with typed exceptions.
  • Add operator runbook: troubleshooting-rocksdb.md.

Testing

  • mvn -pl hadoop-hdds/rocksdb-checkpoint-differ -DskipITs test (fails: missing local SNAPSHOT deps; needs -am build).

@vyalamar vyalamar changed the title Harden RocksDB snapshot error handling and add troubleshooting doc Harden RocksDB snapshot error handling. Feb 15, 2026
@vyalamar vyalamar marked this pull request as ready for review February 15, 2026 10:43
} catch (IOException | RocksDBException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromRocksDB("append list entry", toRocks(exception));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'append' -> 'Append' to be consistent with other messages.

} catch (IOException | RocksDBException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromRocksDB("read list entry", toRocks(exception));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'read' -> 'Failed to read'

} catch (IOException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromIO("deserialize list entry", exception);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'deserialize' -> 'Failed to deserialize'

} catch (IOException | RocksDBException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromRocksDB("read map entry", toRocks(exception));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'read' -> 'Failed to read'

} catch (IOException | RocksDBException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromRocksDB("write map entry", toRocks(exception));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'write' -> 'Failed to write'

} catch (IOException | RocksDBException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromRocksDB("delete map entry", toRocks(exception));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 'delete' -> 'Failed to delete'

throw new RuntimeException(exception);
}
} catch (IOException exception) {
throw SnapshotStorageException.fromIO("deserialize map entry", exception);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'deserialize' -> 'Failed to deserialize'

};
}

private RocksDBException toRocks(Exception e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code duplication, should be moved to the Utility class.

} catch (IOException | RocksDBException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromRocksDB("add set entry", toRocks(exception));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'add' -> 'Failed to add'

} catch (IOException exception) {
// TODO: [SNAPSHOT] Fail gracefully.
throw new RuntimeException(exception);
throw SnapshotStorageException.fromIO("deserialize set entry", exception);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'deserialize' -> 'Failed to deserialize'

};
}

private RocksDBException toRocks(Exception e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code, should be moved to the utility class


public static SnapshotStorageException fromRocksDB(String op,
RocksDBException e) {
return new SnapshotStorageException("Failed to " + op,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding prefix to the message inside ctor would prevent developers from grepping the message in code.


public static SnapshotStorageException fromIO(String op,
java.io.IOException e) {
return new SnapshotStorageException("Failed to " + op,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding prefix to the message inside ctor would prevent developers from grepping the message in code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants